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Abstract: The Ethernet over E1 approach, which takes advantage of widely deployed telecom networks, is 
an efficient and economical way to interconnect two Ethernets In different regions. Two Ethernet over E1 
schemes, namely a byte granularity scheme and a frame granularity scheme are discussed. The byte 
granularity scheme partitions Ethernet frames into several pieces for transmission and has a strict require- 
ment on the maximum delay difference of multiple E1 links. To solve this problem, the newly proposed frame 
granularity scheme transmits separately each frame through E1 links without any partitioning. The architec- 
ture designs of both schemes are presented. This paper evaluates the throughput and delay performances 
of both schemes, both analytically from results calculated from delay models and using test results from field 
programmable gate array (FPGA) implementation. Although the frame granularity scheme has a slightly 
worse delay performance, it has a higher throughput, and is the only choice able to overcome large delay 
differences of the E1 links. 
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Introduction 

The Ethernet has undoubtedly become the most popu- 
lar technique in local area network (LAN) for both its 
simplicity and low cost. The widespread usage of the 
Ethernet technology in LAN environments has forced 
the telecom operators to consider it as the only possible 
technology for metropolitan area network (MAN) ac- 
cess services to public and businesses' l \ Consequently, 
there has been much recent interest in Ethernet inter- 
connection methods. 

Several schemes have been proposed to take advan- 
tage of currently widely deployed synchronous optical 
network (SONET)/ synchronous digital hierarchy 
(SDH) networks. The Ethernet over SONET/SDH 
method was developed to connect two Ethernets in 
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different regions using SDH l2) . The multi-protocol la- 
bel switching (MPLS) protocol has been chosen by the 
Metro Ethernet Forum [3) to adapt Ethernet traffic to 
SONET/SDH networks, which is a sophisticated 
scheme which provides not only Ethernet interconnec- 
tion but also Internet access. A scheme? fot connecting 
several Ethernets in a ring topology utilizing 
SONET/SDH links was subsequently modeled and 
analyzed using stochastic theory 141 . However, for users 
demanding less bandwidth, the Ethernet over El ap- 
proach, which makes use of several El links to provide 
Ethernet interconnection, is more appealing, on ac- 
count of both its low cost and ease of use. The device 
used in the Ethernet over El approach is called a re- 
verse multiplexer, as it always adapts a high rate data 
stream to low speed channels. 

The Ethernet over El scheme is shown in Fig. 1. 
Reverse multiplexer A connects LAN A to multiple 
leased El links. Reverse multiplexer B connects LAN 
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B to the other end of the El links. The communication 
between two LANs is as follows. Reverse multiplexer 
A receives Ethernet frames from LAN A, packs them 
in several El frames which are then transmitted to El 
links. At the other end, Reverse multiplexer B receives 
El frames from El links, unpacks them to get the 
Ethernet frames transmitted from LAN A, and sends 
these frames to LAN B. Similarly, LAN B's Ethernet 
frames can be sent to LAN A. Thus, LAN A and LAN 
B are interconnected through multiple El links and 
two reverse multiplexers. 



LAN A 




Reverse 


-4 El linksf- 






LAN a 




multiplexer A 




multiplexcr IJ 





Fig. 1 Ethernet over El 

From the point of view of the reverse multiplexer, 
the directly connected LAN is called the local Ethernet, 
and the other LAN is called the remote Ethernet. Tak- 
ing Fig. 1 as an example, LAN A is Reverse multi- 
plexer A's local Ethernet, and LAN B is its remote 
Ethernet. 

There are two Ethernet over El schemes, the byte 
granularity scheme 131 and the frame granularity scheme. 
The byte granularity scheme has already been widely 
used for several years, while the frame granularity 
scheme is newly proposed to fix the delay difference 
problem of the byte granularity scheme. 

1 Two Ethernet over El Schemes 

The byte granularity scheme and our newly proposed 
frame granularity scheme will be explained in this sec- 
tion. The essential difference between these two 
Ethernet over El schemes is in the granularity with 
which they manage multiple El links. We use Fig. 1 as 
the communication model in the following section. 
Here, assume that the number of El links leased is N. 

In the byte granularity scheme, on receiving an 
Ethernet frame, Reverse multiplexer A partitions it to 
AT pieces, and transmits them to different El links. At 
the other end, Reverse multiplexer B collects these N 
pieces, and recovers the Ethernet frame for LAN B. 
Due to delay differences between the El links, Reverse 
multiplexer B cannot recover the transmitted Ethernet 



frame until the last piece arrives. As a result, the re- 
verse multiplexers have a strict requirement on the El 
links: the maximum delay differences of the El links 
cannot exceed the delay limitation, which is set by 
most designers as 8 ms or 16 ms [5] . 

Our newly proposed frame granularity scheme is de- 
signed to overcome the delay limitation of the byte 
granularity scheme. Instead of being partitioned into N 
pieces, each frame is wholly transmitted through one 
El link. On receiving an Ethernet frame from LAN A, 
Reverse multiplexer A selects one idle El link in a 
round-robin way, and sends the whole frame through 
that link. At the other end, Reverse multiplexer B col- 
lects each transmitted frame from each separate El 
link. The delay differences of the El links, therefore, 
have no impact on recovering the Ethernet frames, as 
the links independently transmit their own frames and 
do not need to wait for each other. A detailed schedul- 
ing algorithm will be given later, together with the re- 
quired hardware architecture. 

It must be mentioned that delay differences of the 
El links still affect the frame granularity scheme. Suc- 
cessive frames will suffer different delays as they 
travel on separate links, so that the order of the frames 
is not guaranteed in the frame granularity scheme. For- 
tunately, our experimental data reveals that Ethernet 
frame order is not the concern, as the receiver's media 
access controller (MAC) is able to reorder the frames 
for high layer applications. However, an impact on de- 
lay performance is inevitable, and it will be analyzed 
later. 

2 Architecture Design 

2.1 Hardware architecture 

The reverse multiplexer is a device that connects a lo- 
cal Ethernet to multiple El links. On the Ethernet side, 
it interacts with the local Ethernet through a media in- 
dependent interface (MH) as specified in IEEE stan- 
dard 802.3 161 . On the El side, N El links are provided 
for communicating with a telecom network. 

Figure 2 shows the hardware architecture of a re- 
verse multiplexer, where the number of El links is set 
at 8. The scheme for mapping Ethernet frames to El 
frames operates as follows. 



72 



Tsinghua Science and Technology, February 2007, 12(1): 70-76 



lFIFOI I - \ El transmitter! | — — 



tFIFQg j H~El" 



Fig. 2 
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(1) The Ethernet receiver receives Ethernet frames 
through Mil, verifies them for cyclic redundant check 
(CRC) errors, length errors, and dribbling bit errors, 
discards those with errors or those destined to the local 
Ethernet, and forwards the remaining frames for the 
remote Ethernet to FIFOl. The address filter manages 
the MAC address table with learning and aging func- 
tions; The table keeps all the MAC address entries of 
the local Ethernet by learning the source addresses of 
each frame. To save link bandwidth, Ethernet frames 
whose destination address is in the table are not trans- 
mitted to the El links, as they are destined for the local 
Ethernet. In addition, entries out of date are deleted 
from the table. 

(2) The generic frame procedure (GFP) packer en- 
capsulates the Ethernet frames according to the GFP 
specified in ITUT recommendation G.7041 [7J . The en- 
capsulated Ethernet frames, called GFP frames, are 
then sent to the transmit scheduler. 

(3) The transmit scheduler assigns the GFP frames 
to eight El transmit buffers (tFIF01-tFIF08) in a 
round robin way. In the byte granularity scheme, once 
a byte is assigned, the transmit scheduler selects the 
next non-fiill buffer, and writes the next byte into it. In 
the frame granularity design, the transmit scheduler 
changes the link every frame. The transmit scheduler 
assigns the first frame to tFIFOl, the second frame to 
tFIF02, and so on. In the byte granularity design, the 
transmit buffers are negligible, because each buffer 
only holds one byte. In contrast, in the frame granular- 
ity design, each buffer needs to be larger than the big- 
gest GFP frame. 

(4) The El transmitters (El transmitter 1 -El trans- 
mitters) get the GFP data from their transmit buffers, 
respectively, pack them into El frames, and send to El 
links. 



The reverse operation, mapping El frames to 
Ethernet frames, is carried out as follows. 

(1) The El receivers (El receiver 1 -El receiver8) re- 
ceive El frames from El links. In the byte granularity 
scheme, the El receivers store El frames to each re- 
ceive buffers (rFIF01-rFIF08) for delay alignment. In 
the frame granularity scheme, the El receivers directly 
get the GFP frames from El frames, and store to their 
receive buffers, respectively. 

(2) The receive scheduler reads data from the eight 
receive buffers, and sends the recovered GFP frames to 
the GFP unpacker, where the GFP frames are un- 
packed to get the Ethernet frames. 

(3) The Ethernet frames forwarded to FIF02 by the 
GFP unpacker are then read out by the Ethernet trans- 
mitter, and transmitted to the LAN through MIL 

2.2 Design details 

First, FIFOl and FIF02 are asynchronous FIFOs 
whose read and write clocks are not synchronous. Cod- 
ing the memory addresses using gray codes helps to 
avoid full and empty misjudgments^. , 

The second is the issue . of virtual concatenation. In 
the byte granularity- scheme, virtual concatenation 
technique in SDH f9) is used to synchronize the differ- 
ent El links. Thus, El frames are needed to carry the 
time information for the delay alignment, which in- 
creases the overhead. 

The last thing to mention concerns the GFP encap- 
sulation protocol. Several encapsulation protocols are 
currently in use, such as GFP, link access procedure- 
SDH (LAPS)" 01 , and point to point protocol (PPP)/ 
high level data link control (HDLC) 1 " 1 . Experimental 
studies reveal that when the frame length is relatively 
short, the frame loss rate of the GFP and LAPS meth- 
ods are nearly the same, in spite of the link bandwidth. 



CHEN Wentao et al: Performance Analysis of Two Ethernet over El Schemes 



However, for longer frame lengths, the frame loss rate 
of the GFP method is much lower than that of LAPS tl2) . 
Compared with PPP/HDLC, GFP does not inflate the 
data length in a non-deterministic manner, and has a 
more robust frame delineation mechanism 12 '. All these 
traits are important in our choice of using GFP as our 
encapsulation protocol. 

3 Performance Evaluation 

3.1 Throughput performance 

Each El frame has 32 slots, and each El link can carry 
a maximum data rate of 2.048 Mb/s. Thus, 8 links can 
provide a throughput of no more than 16.384 Mb/s. 

In the byte granularity scheme, to accommodate the 
8-ms delay difference, one extra slot is needed as a 
time stamp. The maximum throughput of the byte 
granularity system is, therefore, somewhat smaller than 
that of the frame granularity system. In our implemen- 
tation, the 1st slot is used for El synchronization, the 
2nd slot is used as time stamp, and the remaining 30 
slots carry the payload. As a result, each El link can 
carry a data rate of 1.920 Mb/s, and the throughput of a 
device with 8 links is therefore 15.360 Mb/s. 

In the frame granularity system, there is no need to 
carry time information in the El frames, so only the 1st 
slot is used for El frame synchronization. Thus,. the 
data rate of each El link is 1.984 Mb/s, and 8 links 
provide a throughput of 1 5.872 Mb/s. 

3.2 Delay performance 

The ping program provided by the Windows® opera- 
tion system is used to test the round trip delay of the 
system. Our analytical model is built on this basis, 
which serves as a guide to test the two schemes. The 
longest Ethernet frame is of 1518 bytes, and the maxi- 
mum payload is of 1500 bytes, of which 8 bytes are IP 
mapping overhead from IP packet to Ethernet frames. 
Thus, an IP packet that is longer than 1492 bytes must 
be fragmented into several pieces in order to fit into 
the Ethernet container. Assume that the IP packet 
length is L in bytes, where L ^ 65 535 . If L is smaller 
than 1492, then this packet will be packed in one 
Ethernet frame. Otherwise, this packet will be 



fragmented into [j^J + 1 pieces, where |_jcJ 
means the largest integer smaller than x. Of these 
pieces, |^jJ^J pieces are of length 1492 bytes, and 

the remaining piece is of L (mod 1492) bytes. These 
Ethernet frames are then encapsulated in GFP frames 
for transmission to El links. This will introduce some 
GFP overhead. Here however, we neglect the GFP 
overhead for two reasons. First, GFP overhead is neg- 
ligible for long IP packet lengths. Second, both the 
byte and frame granularity schemes require the same 
overhead. Neglecting the GFP overhead, therefore, has 
no impact on comparing these two schemes. 

The number of El links is set at 8, as shown in Fig. 
2. Figure 3 shows a delay model of the Ethernet over 
El system. Computer A sends an IP packet to Com- 
puter B. The packet is fragmented into several Ethernet 
frames, and transmitted to the Reverse multiplexer A. 
The transmit scheduler of Reverse multiplexer A as- 
signs the traffic to eight El links either in byte granu- 
larity or in frame granularity, according to the scheme 
adopted. t M -t m are the delays of the El links. For 
example, every bit transmitted through the 1st El link 
suffers a delay of t ui before it arrives at Reverse mul- 
tiplexer B, where the Receive scheduler collects the 
frames transmitted by Computer A and forwards them 
to Computer B. Here, t D1 denotes total delay each bit 
suffers as it goes from Computer A to transmit sched- 
uler of Reverse multiplexer A. Similarly, t D2 denotes 
the total delay from receive scheduler of reverse multi- 
plexer B to Computer B. 

According to the delay mode! above; each IP packet 
sent from Computer A to Computer B suffers three 
kinds of delay: the delay from Computer A to transmit 
scheduler of Reverse multiplexer A (t Dl ), the delay in- 
troduced by the El links, denoted as t u , and the delay 
from receive scheduler of Reverse multiplexer B to 
Computer B (/ D2 ). The total packet delay from Com- 
puter A to Computer B is 

'D^D.+'.d+'oj (1) 

Similarly, the packet delay from Computer B to 
Computer A is 
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Fig. 3 Delay model of the system 



r D =r Dl +r u + r M (2) 
where r DI is the delay from Computer B to transmit 
scheduler of Reverse multiplexer B, r u is the delay 
introduced by the El links, and r D2 is the delay from 
receive scheduler of Reverse multiplexer A to Com- 
puter A. 

As a result, the round trip delay is 

^ = ^+^D='D.+Ad+/D2+'-D l +'' 1 d+'b 2 (3) 

In the byte granularity scheme, the transmit sched- 
uler of Reverse multiplexer A schedules the GFP 
frames in byte granularity. Assuming that the current 
byte goes to link i, where / is an integer between 1 and 
8, and then the next byte goes to link J, where 
j= /+1 (mod 8) . 

Thus, each 8 consecutive bytes go to different El 
links. Each GFP frame is thus approximately separated 
to 8 equal pieces, and each piece goes to the different 
El links. Now consider that an IP packet of length L is 
transmitted by Computer A. After fragmentation, it be- 
comes|^^~J Ethernet frames with each of 1518 bytes, 

plus another Ethernet frame of L (mod 1492)+26 
bytes. These Ethernet frames are then encapsulated into 
GFP frames, segmented into 8 pieces, and transmitted 
to the El links. The receive scheduler of Reverse mul- 
tiplexer B cannot recover the GFP frames until the last 
piece arrives. So the delay introduced by the El links 
is the maximum delay of all the links. According to 
Eqs. (1) - (3), we can obtain the packet round trip delay 
Db >n the byte granularity scheme as 

V 



= r Dl +r D2 + max (r M ,r U2 ,-,r m ) + — (5) 



(6) 



where R is the total data rate of the El links, and V 
is the number of bits of all GFP frames generated from 
the IP packet. 

In the frame granularity system, the transmit sched- 
uler of Reverse multiplexer A schedules the GFP 
frames in frame granularity. Each frame in a set of 8 
consecutive frames goes to a different El link. Each 
GFP frame thus goes through a separate El link. Con- 
sider again that an IP packet of length L is transmit- 
ted by Computer A. After fragmentation, it becomes 

[l^2 J Ethemet frames with each of 1 5 1 8 bytes, plus 
another Ethernet frame of L (mod 1492)+26 bytes. 
These Ethernet frames are then encapsulated into GFP 
frames, and assigned to 8 El links. The receive sched- 
uler of Reverse multiplexer B collects these GFP 
frames from separate links, recovers the Ethemet 
frames, and sends them to Computer B. Each GFP 
frame goes through a separate El link, so some El 
links may be idle, when there are less than eight frames 
to be transmitted. In this scheme, the El links inde- 
pendently transmit their own GFP frames. Frames go- 
ing through different links suffer different delays. 

First, calculate the amount of data assigned to each 
EI link. Based on the link delays, we can determine 
which link is the latest to forward all its traffic to 
Computer B. 

L1492J 

frames an IP packet of length L will be encapsulated 
into. In the scheme, these frames are assigned to eight 
links in a round robin way. Define L s as the number 
of bytes assigned to link /. Then Z,,. is given by the 
following equations: 



Let j = 



+ 1 , where j is the number of GFP 
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fl518(c+l), l^/</ 0 ; 
L, = |l518c + L(mod 1492) + 26, / = /„; 

11518c, / 0 </^8 (7) 

with c and ;* 0 given by 

c = [ Z ^|'o=0'-D(mod8) + l. 

The packet delay is the maximum delay of all links. 
According to Eqs. (l)-(3), we get the packet round trip 
delay D ( in the byte granularity scheme: 

tor ='di +'d2 + m ax (/„,, +^-,/ ld2 +^v",/ ldg +^-) 
(8) 



>br =r D1 +r D2 +max (r M1 



(9) 
(10) 



where r is the data rate of each El link. 

According to the data from the throughput analysis, 
let R = 15.36 Mb/s, r = 1 .984 Mb/s, / ld| - = 0, 
'd. =°» ^2=0. >bi=0. and r 02 =0. We can then 
determine the round trip delay of various IP packets, 
as shown in Fig. 4. For the same packet length, the 
byte granularity scheme has a slightly better perform- 
ance than the frame granularity scheme, because the 
link bandwidth is better utilized in the byte granular- 
ity scheme, as links work together to transport the 
data, and no link will be idle if there are frames to be 
transmitted. However, in the frame granularity 
scheme, links transport the frames independently. 
Some links, therefore, might be idle, while others are 
working. 




Packet Icnglh (kf») 
Fig. 4 Round trip delay 

For certain specific packet lengths, there is less de- 
lay in the frame granularity scheme than in the byte 
granularity scheme, because the throughput of the 
frame granularity scheme is a little larger than that of 



the byte granularity scheme, where one extra slot is 
needed as a time stamp. Thus, IP packets with specific 
lengths can fully utilize the bandwidth of the frame 
granularity scheme, resulting in a smaller delay. 

The last point to note concerning Fig. 4 is that for 
the same packet length, the packet delay in the frame 
granularity scheme does not exceed 12 ms, more than 
that of the byte granularity scheme, though it is larger 
in most cases. 

4 Test Results 

Field programmable gate array (FPGA) implementa- 
tion of both schemes has been realized. The test envi- 
ronment is built as shown in Fig. 5. Two computers are 
connected using two reverse multiplexers plus 16 El 
cables. 16 El cables provide 8 El links. 



Computer _ ( 



Computer 



Fig. 5 Test environment 

Computer A sends as many as possible Ethernet 
frames to Computer B, and we record the amount of 
data received by computer B as the throughput The 
transmission, of Ethernet frames of different lengths 
has been tested. The throughput results for both 
schemes are shown in Fig. 6. The results agree quite 
well with our expectations. The throughput is ap- 
proximately 15.36 Mb/s for the byte granularity 
scheme, and 15.87 Mb/s for the frame granularity 
scheme. 



— ■ — Frame granularity 
- * — Byte granularity 



Frame length (bytes) 
Fig. 6 Throughput results 

The difference in delay time for the two schemes has 
also been investigated. For this, Computer A pings 
Computer B with packets of different lengths, and the 
round trip delays are recorded. The delay results for 
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the two schemes are shown in Fig. 7. The form of Fig. 
7 is similar to that of Fig. 4, except for the cases of 
small packet lengths. The discrepancy is caused for 
two reasons. First, in calculating the data for Fig. 4, 
assume r D1 =0, f K =0, r D1 =0, and r M =0. 
These delays cannot, however, be neglected for small 
packet lengths. Second, the round trip delays are re- 
corded by the ping program in units of milliseconds. 
The smaller the delay, the more significant will be the 
error in the data. 

50r 




0 8 \6 24 32 



Packet length (kB) 
Fig- 7 Round trip delay results 

5 Conclusions 

Based on both the analysis and experimental results, 
we have compared two schemes for an Ethernet over 
El implementation. The conventional byte granularity 
scheme manages multiple El links at a finer granular- 
ity than the frame granularity scheme, and sO it may be 
expected to perform better as a result. However, when 
the delay differences of multiple links are taken into 
account, the byte granularity scheme is somewhat pun- 
ished for this finer management. First, the byte granu- 
larity scheme needs extra bandwidth to carry the time 
information, which results in a decreased throughput. 
To accommodate greater delay differences, more 
bandwidth and a buffer are required. Secondly, al- 
though the byte granularity scheme has a better delay 
performance, the delay difference between the two 
schemes does not exceed more than 12 ms for any 
given packet length. 

As a result, each scheme has certain advantages, 
based on the link's quality and on the user's expecta- 
tions of quality of service. When the delay differences 
of the multiple links are small, the byte granularity 
scheme offers better delay performance, whereas the 
frame granularity scheme offers a larger throughput. 



Otherwise, the frame granularity scheme is the only 
choice able to overcome large delay differences, 
while providing a bit higher throughput at the same 
time. 
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